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Abstract. We suggest a new greedy strategy for convex optimization 
in Banach spaces and prove its convergent rates under a suitable behav¬ 
ior of the modulus of uniform smoothness of the objective function. 
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1. Introduction 

The main goal in convex optimization is the development and analysis of 
algorithms for solving the problem 

( 1 . 1 ) inf E(x), 

where E is a given convex function and is a bounded convex subset of a 
Banach space X. E is called the objective function and satisfies the convexity 
condition 

E(^x + by) < 'yE(x) + 5 E(y), x,y£Q, 7,6 > 0 , 7 + 5 = 1 . 

While the classical convex optimization deals with objective functions E 
defined on subsets fl in lR n for moderate values of n, see [2j, some of the 
new applications require that the dimension n is quite large or even 00. 
The design of algorithms for such cases is quite challenging since typical 
convergent results involve n, and therefore deteriorate severely with the 
growth of n. This is the so-called curse of dimensionality. Recently, there 
has been an increased interest, see @111, in developing greedy based 
strategies for solving (11.11) with provable convergence rate depending only on 
the properties of E and not on the dimension of the underlying space. These 
algorithms provide approximations {E(x m )}, m = 1 , 2 ,... to the solution 
of (II. ID . with x m being a linear combination of m elements from a given 
dictionary T> C X. A dictionary is any set T> of norm one elements from X 
whose span is dense in X. An example of a dictionary is any Shauder basis 
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for X, or a union of several Shauder bases. The current algorithms pick the 
initial approximation E(x o), xq = 0 , the set as 

n := {x e X : E{x) < £( 0 )}, 

since the global minimum of E is attained on that set, and generate a se¬ 
quence of successive approximations E m := E(x m ), m = 1 , 2 ,... recursively, 
using the dictionary T>. Some methods, such as the Weak Chebychev Greedy 
Algorithm, see [8], provide at Step m an approximant x m to the point x at 
which E attains its global minimum, determined as 

Xm ■ argmin X £span{ifij 1 ,...,(pj rn }E{x^, 

where (pj 1 ,..., ipj m are suitably chosen elements from T>. Others choose x m 
as 

:= argmin ajAg j R i?(a;x 

m— 1 + A <p m) i 
or 

x m := argmin AG[01 ]£ , ((l - X)x m -i + Xp m ) 

for suitably chosen tp m £ F>, where x m -i is the previously generated point. 
Convergence rates for these algorithms are proved to be of order 0 (m 1 ~ q ), 
where q is a parameter related to the smoothness of the objective function 
E. Note that the last two approaches are more computationally friendly, 
since they require solving two or one dimensional optimization problems at 
each step. On the other hand, some of these algorithms work only if the 
minimum of E is attained in the convex hull of T>, since the approximant 
x m is derived as a convex combination of x m _i and ip m . 

In this paper, we introduce a new greedy algorithm based on one dimen¬ 
sional optimization at each step, which does not require the solution of (11.111 
to belong to the convex hull of T> and has a rate of convergence (D(m 1 ~ q ). 
This algorithm is an appropriate modification of the recently introduced 
Rescaled Pure Greedy Algorithm (RPGA) for approximating functions in 
Hilbert and Banach spaces, see [ 7 ]. We call it RPGA(co). The paper is 
organized as follows. In Section 1 J 21 we list several definitions and known 
results about convex functions. In section (j3l we present the RPGA(co) 
and prove its convergence rate. The rest of the paper describes the weak 
version of this algorithm. 


2. Preliminaries 

Let us first recall that a function E is Frechet differentiable at x G H if 
there exists a bounded linear functional, denoted by E'[x) € X *, such that 

\E{x + h)~ E(x) - {E'(x), h)\ 

/wo \\h\\ 

Here we use the notation (F,x) := F(x) to denote the action of the func¬ 
tional F € X* on the element x £ X. 

The following lemmas are well known and we simply state them. 
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Lemma 2.1. Let E be a Frechet differentiable function at each point in 
and convex on X. Then, for all x £ fl and x' £ X, 

(E'(x),x — x') > E(x) — E(x'). 

Lemma 2.2. Let E be a Frechet differentiable convex function, defined on 
a convex domain Cl. Then E has a global minimum at x £ Cl if and only if 
E'(x) = 0. 


Lemma 2.3. Let F be a Frechet differentiable function and x* be such that 
x* = argmin{.F(x) : x = tip, t £ 1R}. Then, ( F'(x*),x *) = 0. 

In this paper, we consider objective functions E that satisfy the following 
two assumptions. 

• Condition 0: E has Frechet derivative E'(x) £ X* at each point 
in Cl := {x £ X : E{x) < F1(0)}, Cl is bounded, and 

\\E\x)\\ < M 0 , x£Cl. 

• Uniform Smoothness (US): There are constants 0 < a, M > 0, 
and 1 < q < 2, such that for all x, x' with ||x — x'\\ < M, x £ Cl, 

E(x') — E{x) — (E'(x),x' — x) < a\\x' — x\\ q . 

The US condition on E is closely related to a condition on the modulus 
of smoothness of E. Recall that for a convex function E : X —» M and a set 
S C X, the modulus of smoothness of E on S is defined by 

p(E,u) := - sup {E(x + uy) + E(x — uy) — 2E(x)} , u> 0, 

2 zes,||2,||=i 

and the modulus of uniform smoothness of E on S is defined by p\ : = 
pi{E,u) 

f (1 — A )E(x — Xuy ) + A E(x + (1 — A )uy) — E(x) ) 

Pi ■■= sup -—- — -^ . 

xeSf,||y||=l,Ae(0,l) l _ A ) J 

These two moduli of smoothness are equivalent (see mi, page 205), as the 
following lemma states. 

Lemma 2.4. Let E be a convex function defined on X, S C X, and p(E, •) 
and p\(E,-) be its modulus of smoothness and modulus of uniform smooth¬ 
ness, respectively. Then we have 

u 

4p(U, -) < Pi(E,u) < 2p(E,u). 

The next lemma shows the relation between the modulus of smoothness 
and the US condition. The proof of the version cited here can be found in 
[6j. Because of this lemma, the US condition and the condition from jEHf) , 4] 
on the modulus of smoothness of E are equivalent. 
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Lemma 2.5. Let E be a convex function defined on a Banach space X and 
E he Frechet differentiable on a set S C X. The following statements are 
equivalent for any q £ (1,2] and M > 0. 

• There exists a > 0 such that for any x £ S, x' £ X, ||x — x’\\ < M, 

(2.2) E(x') — E(x) — (E'(x),x' — x) < a\\x' — x\\ q . 

• There exists a\ > 0, such that 

(2.3) p(E,u, S) < a\u q , 0 <u<M. 

Next, we introduce some notation. Let x be the solution to (II.ID . We 
denote by ||x||i its semi-norm with respect to the dictionary T>, namely 

||x||i := inf < ^2 \ c <p{%)\ '■ % = ^2 > , 

V ip£T> J 

where the infimum is taken over all possible representations of x as a linear 
combination of dictionary elements. Clearly, the point x at which E attains 
its global minimum belongs to the set 

Q := {x : E(x) < £(0)}, 

and in what follows we will consider the minimization problem (jl.ip over 
this set. Note that this is a convex set as a level set of a convex function. 

Further in the paper we will use the following lemma, proved in | 6 ]. Other 
versions of this lemma have been proved in m- 

Lemma 2.6. Let £ > 0, r > 0, B > 0, and {a m }^f =1 and {r m }“ =2 be 
sequences of non-negative numbers satisfying the inequalities 

ai<B, a m+ i < a m (l - ^-^-a^), m = l,2,.... 

r 

Then, we have 

(2.4) a m < nmx{l ,r 1/l }r l / e {rB~ e + EJ^r*) -1 ^, m = 2,3,.... 

3. The Rescaled Pure Greedy Algorithm for Convex 

Optimization 

In this section, we describe our new algorithm with parameter p and 
dictionary T>. 

RPGA(co)(/i, V): 

• Step 0: Define xq = 0. If E'(xq) = 0, stop the algorithm and dehne 
Xk := xo = x, k > 1. 

• Step m: Assuming x m _i has been defined and E'(x m - 1 ) 0. Choose 

a direction (pj rn £ V such that 

\{E r <Pj m }\ = sup \(E\x m -i),<p)\. 

<f£T> 
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With x m := x m -i - Awhere 

A m := sgn{(ii/ {x m ~ i), ( a h) 9-1 K-E' {, x m— 1)1 Tj m )\ 9 1 > 

tm ■■= argrnin t&R E(tx m ), 
dehne the next point to be 


• If E'(x rn ) = 0, stop the algorithm and define Xk = x m = x, for 
k > m. 

• If E'(x m ) / 0, proceed to Step m + 1. 

Let us observe that , because of Lemma [2721 if E'(x m ) = 0 at Step m, the 
output x m of the algorithm is the minimizer x. Note that the algorithm 
requires a minimization of the objective function along the one dimensional 
space span{x m }. This univariate optimization problem is called line search 
and is well studied in optimization theory, see j3j- If at Step m we were to 
use x m as next approximant and not x m , which is the minimizer of E along 
the line generated by x m , then the algorithm would be very similar to the 
EGA(C) from 0 . The author proves a convergence rate of 0(m r ), for any 
r € (0, for this algorithm under suitable conditions on the parameters. 
Note that our algorithm, which simply adds a one dimensional optimization 
at each step, makes it possible to achieve an optimal convergence rate of 
0(m 1 ~ g ). Observe also that, in contrast to the other greedy algorithms from 
[8j that rely on one dimensional minimization at each step, this algorithm 
provides convergent results for all x, and not only for x in the convex hull 
of the dictionary T>. 

Notice that all outputs generated by the RPGA(co)(/i,H) are 

in fl, since E(xk) < E( 0). The following theorem is our main convergence 
result. 

Theorem 3.1. Let the convex function E satisfy Condition 0 and the 
US condition. Then, at Step k, the RPGA(co) (p, T>) with parameter p. > 
max{l, a^MoM 1-9 }, applied to E and a dictionary V = {p} outputs the 
point Xk, where 

ek ■= E{xk) — E(x) < C\k }~ q , k > 2, 
with C\ = Ci(q,a,E, p). 

Proof. Clearly, we have e\ = E(x\) — E{x) < E( 0) — E(x). Next, we 
consider Step k, k = 2,3,... of the algorithm. Notice that Xk £ U, since 
E{xjf) < E( 0). The definition of A k and the choice of parameter p assures 
that 
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and therefore, applying the US condition to (x k -\ — ^kfj k ) and x k -\ gives 
E(x k ) = E(x k _i - \kVj k ) < E(x k _i) - \ k (E'(x k -i),tpj k } + a\\ k \ q 

= E(x k ^) - ^ K^(x fc _ 1 ),^- h )|^- 1 ), 

1 1 

where we use the fact that \\<Pj k \\ < 1. Since E(x k ) < E(x k ), we derive that 

(3.5) E(x k ) < E(x k _ i) - -—- (a/i)'U \(E'(x k _i),(p jk )\E^. 

Next, we provide a lower bound for \(E' (x k -±), y>j k )\- Let us fix e > 0 and 
choose a representation for x = J2ipev such that 

^ 2 141 < +e - 

Since {E'(x k ~i),x k _\) = 0 , because of the choice of x k _i and Lemma [2731 
we have that 


(E'(x k -i),x k -i-x) = -{E'(x k -i),x) = - 'y' j (%,(E'(x k -i),<p} 

f 

< \{E'(x k _ l ),ip jk )\Yl l 14 ) 

< |(f^(s fc _i),^- fc )(||s||i+e), 

where we have used the choice of ipj k . We let e -A 0 and obtain the inequality 
(3.6) {E'(x k _ i),x fc _i -x) < \{E'(x k -i),yj k )\\\x\\i. 

On the other hand, Lemma o and (13.61) give that 

11^114^-1 < \{ E , ( x k _ 1 ), ip jk )\, 

which is the desired estimate from below for \(E'(x k -i),ipj k )\ We substitute 
the latter inequality in (13.51) and derive 

E(x k ) < E(x k - 1 ) - -—- (an )~—1 HxIIj 9-1 
A 4 

Subtracting E(x) from both sides gives 


e k < e k -i ^1 - ^4 (an) 1- 1 q 1 

Now we apply Lemma 12.61 for the sequence of errors {e k }?? =1 and 

r k = —--, 1 =—-—> 0, B = E(0) — E(x), r = (a/dlxll?) U 1 

n q~ 1 

and derive that 


and the proof is completed. 


,_1 , 1 , 

H- (m 



□ 
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Notice that we can optimize with respect to the parameter p, and select 
a specific value for p > max{l, a~ 1 MoM 1 ~ q } that will guarantee the best 
convergence rate in terms of best constants. 


4. The Weak Rescaled Pure Greedy Algorithm for Convex 

Optimization 

In this section, we describe the weak version of our algorithm with weak¬ 
ness sequence { 4 }, 4 £ (0,1] k = 1 , 2 ,..., and parameter sequence {pk }, 
p k > max{l, a~ l M qM 1 ~ q }, k = 1,2,.... In the case when tk = 1 and Hk = 
p, k = 1,2,..., the WRPGA(co)({4}, {//&}> E>) is the RPGA(co)(/i, V). 
The weakness sequence allows us to have some freedom in the selection of 
the next direction tpj k , while the parameter sequence {pk } gives more choices 
in how much to advance along the selected direction (pj k . 
WRPGA(co)({4},{ Mfe },P): 

• Step 0: Define xq = 0. If E'{x o) = 0, stop the algorithm and define 
Xk '■= xq = x, k > 1 . 

• Step m: Assuming x m -i has been defined and E'(x m - 1 ) ^ 0. Choose 
a direction ipj m G V such that 

|(£’'(x m _i),<^ m )| > 4i sup \(E'(x m - 1 ),ip}\. 

With x m := x m —i - where 

:= sgn{(.E (x m _i), (ap m ) 7-1 |(E (x m —i ), )| , 


t m \= argmin tgK P;(tx m ), 
define the next point to be 


• If E'(x m ) = 0, stop the algorithm and define Xk = x m = x, for 
k > m. 

• If E'{x m ) ^ 0, proceed to Step m + 1. 

The next theorem is the main result about the convergence rate of the 
WRPGA(co)({4}, {/**},£>). 


Theorem 4.1. Let the convex function E satisfy Condition 0 and the US 
condition. Then, at Step k, the WRPGA(co) ({Ik}-, {h-k}^), applied to E 
and a dictionary V = {(/?} outputs the point Xk, where 


e k := E(x k ) - e(x) < a\\x\\l 


C\ + ^(/A _ 1) 


i =2 



k > 2 , 


with Ci = Ci(q, a, E). 
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Proof. Similarly to the proof of Theorem 14. 11 we have that e± < E( 0) — E(x) 
and for k > 2, 

(4.7) E{x k ) < E(x k _ i) - —-^-(a/i fc ) _ Ta |(£: , (x fc _ 1 ) J 

I^k 

The same way one can easily derive that 

11^11^4^-1 < \{E'{x k _i), ip jk }\, 


and thus the estimate 


e k < f ‘k-\ 1 — 




Now we apply Lemma [2~TT1 for the sequence of errors {e/ c }^ =1 and 

1 


r k = (A*fc—1) 
and derive that 


( 

\Mfc/ 


9-1 


> 0, B = E(0)—E(x), r = (a||x||f) s- 1 , 


efc < a||x||'( 


ax 


~||9 


^£7(0) -£7(x) 

and the proof is completed. 


1 

q~ 1 


1=2 


1-9 


Pj 


□ 
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